D-GridMST: Clustering Large Distributed Spatial Databases
نویسندگان
چکیده
In this paper, we will propose a distributable clustering algorithm, called Distributed-GridMST (D-GridMST), which deals with large distributed spatial databases. D-GridMST employs the notions of multi-dimensional cube to partition the data space involved and uses density criteria to extract representative points from spatial databases, based on which a global MST of representatives is constructed. Such a MST is partitioned according to users’ clustering specification and used to label data points in the respective distributed spatial database thereafter. Since only the compact information of the distributed spatial databases is transferred via network, D-GridMST is characterized by small network transferring overhead. Experimental results show that D-GridMST is effective since it is able to produce exactly the same clustering result as that produced in centralized paradigm, making D-GridMST a promising tool for clustering large distributed spatial databases.
منابع مشابه
Distributed clustering and local regression for knowledge discovery in multiple spatial databases
Many large-scale spatial data analysis problems involve an investigation of relationships in heterogeneous databases. In such situations, instead of making predictions uniformly across entire spatial data sets, in a previous study we used clustering for identifying similar spatial regions and then constructed local regression models describing the relationship between data characteristics and t...
متن کاملClustering for Mining in Large Spatial Databases
In the past few decades, clustering has been widely used in areas such as pattern recognition, data analysis, and image processing. Recently, clustering has been recognized as a primary data mining method for knowledge discovery in spatial databases, i.e. databases managing 2D or 3D points, polygons etc. or points in some d-dimensional feature space. The well-known clustering algorithms, howeve...
متن کاملA Database Interface for Clustering in Large Spatial Databases
Both the number and the size of spatial databases are rapidly growing because of the large amount of data obtained from satellite images, X-ray crystallography or other scientific equipment. Therefore, automated knowledge discovery becomes more and more important in spatial databases. So far, most of the methods for knowledge discovery in databases (KDD) have been based on relational database s...
متن کاملTowards Real-Time Geodemographics: Clustering Algorithm Performance for Large Multidimensional Spatial Databases
Geodemographic classifications provide discrete indicators of the social, economic and demographic characteristics of people living within small geographic areas. They have hitherto been regarded as products, which are the final “best” outcome that can be achieved using available data and algorithms. However, reduction in computational cost, increased network bandwidths and increasingly accessi...
متن کاملA Database Interface for Clustering in Large Spatial Databases1
Both the number and the size of spatial databases are rapidly growing because of the large amount of data obtained from satellite images, X-ray crystallography or other scientific equipment. Therefore, automated knowledge discovery becomes more and more important in spatial databases. So far, most of the methods for knowledge discovery in databases (KDD) have been based on relational database s...
متن کامل